Goto

Collaborating Authors

 output function


Learning Functional Transduction: S.I. Contents

Neural Information Processing Systems

We propose below the proofs of the results presented in the main text. Most of the arguments are adapted from the development proposed in (Zhang, 2013) which goes beyond real or complex-valued RKBS developed in (Zhang et al., 2009; Song et al., 2013) to develop the notion of vector-valued RKBS. In addition, we note that assumptions regarding the properties of the RKBS of interests such as uniform Frรฉchet differentiability and uniform convexity have been further relaxed in other works (Xu and Ye, 2019; Lin et al., 2022) but are here sufficient for our discussion since they guarantee the unicity of a semi-inner product x.,.yB compatible with the norm ||.||B (Giles, 1967). S.1.1 Theoretical results Theorem 1 Theorem 1 gathers for the sake of compactness the definition of a vector-valued reproducing kernel Banach space with the properties of existence and unicity of the kernel K. Proof. For any v PV and u PU, the mapping Oรžร‘ xOpvq,uyU is a bounded linear form in LpBq.



Learning Functional Transduction: S.I. Contents

Neural Information Processing Systems

We propose below the proofs of the results presented in the main text. RKBS developed in (Zhang et al., 2009; Song et al., 2013) to develop the notion of vector-valued (Giles, 1967). " 0, @ j ฤ n, @ u P U (9) which allows us to say that O P RKBS (Corollary 3.2 of Zhang (2013)) that we recall hereafter: We first define for any linear operator We show our result in the case J=1 and can be directly extended to any cardinality J. Specifically, we tested three expressions: Exp. The two first expressions yield similar result in the ADR experiment at an equal compute cost. We also tried a'branch' and'trunk' networks formulation of the model as in DeepONet (Lu T able S.2: Summary of the architectural hyperparameters used to build the Transducer in the four experiments. 'Depth' corresponds to network number of layers, 'MLP dim' to the dimensionality of the hidden layer As stated, we used for all experiments, the same meta-training procedure. T able S.3: Summary of the meta-learning hyperparameters used to meta-train the Transducer in our four Figure S.1: Examples of sampled functions ฮด p xq and ฮฝ px q used to build operators O We train Tranducers for 200K gradient steps. Flow library (Holl et al., 2020) that allows for batched and differentiable simulations of fluid dynamics Figure S.5: Magnitude of the complex coefficients of the Fourier transform of an exemple pair of input and In order to tackle the high-resolution climate modeling experiment, we take inspiration from Pathak et al. (2022), which combines neural operators with the patch splitting L " 12, in order to match number of trainable parameters.




Drawback of Enforcing Equivariance and its Compensation via the Lens of Expressive Power

arXiv.org Machine Learning

Equivariant neural networks encode symmetry as an inductive bias and have achieved strong empirical performance in wide domains. However, their expressive power remains not well understood. Focusing on 2-layer ReLU networks, this paper investigates the impact of equiv-ariance constraints on the expressivity of equivariant and layer-wise equivariant networks. By examining the boundary hyperplanes and the channel vectors of ReLU networks, we construct an example showing that equivariance constraints could strictly limit expressive power. However, we demonstrate that this drawback can be compensated via enlarging the model size. Furthermore, we show that despite a larger model size, the resulting architecture could still correspond to a hypothesis space with lower complexity, implying superior generalizability for equivariant networks.


Interpretable Neural Approximation of Stochastic Reaction Dynamics with Guaranteed Reliability

arXiv.org Machine Learning

Stochastic Reaction Networks (SRNs) are a fundamental modeling framework for systems ranging from chemical kinetics and epidemiology to ecological and synthetic biological processes. A central computational challenge is the estimation of expected outputs across initial conditions and times, a task that is rarely solvable analytically and becomes computationally prohibitive with current methods such as Finite State Projection or the Stochastic Simulation Algorithm. Existing deep learning approaches offer empirical scalability, but provide neither interpretability nor reliability guarantees, limiting their use in scientific analysis and in applications where model outputs inform real-world decisions. Here we introduce DeepSKA, a neural framework that jointly achieves interpretability, guaranteed reliability, and substantial computational gains. DeepSKA yields mathematically transparent representations that generalise across states, times, and output functions, and it integrates this structure with a small number of stochastic simulations to produce unbiased, provably convergent, and dramatically lower-variance estimates than classical Monte Carlo. We demonstrate these capabilities across nine SRNs, including nonlinear and non-mass-action models with up to ten species, where DeepSKA delivers accurate predictions and orders-of-magnitude efficiency improvements. This interpretable and reliable neural framework offers a principled foundation for developing analogous methods for other Markovian systems, including stochastic differential equations.


Probability Distributions Computed by Hard-Attention Transformers

arXiv.org Artificial Intelligence

Most expressivity results for transformers treat them as language recognizers (which accept or reject strings), and not as they are used in practice, as language models (which generate strings autoregressively and probabilistically). Here, we characterize the probability distributions that transformer language models can express. We show that making transformer language recognizers autoregressive can sometimes increase their expressivity, and that making them probabilistic can break equivalences that hold in the non-probabilistic case. Our overall contribution is to tease apart what functions transformers are capable of expressing, in their most common use-case as language models.


SolarBoost: Distributed Photovoltaic Power Forecasting Amid Time-varying Grid Capacity

arXiv.org Artificial Intelligence

This paper presents SolarBoost, a novel approach for forecasting power output in distributed photovoltaic (DPV) systems. While existing centralized photovoltaic (CPV) methods are able to precisely model output dependencies due to uniformity, it is difficult to apply such techniques to DPV systems, as DPVs face challenges such as missing grid-level data, temporal shifts in installed capacity, geographic variability, and panel diversity. SolarBoost overcomes these challenges by modeling aggregated power output as a composite of output from small grids, where each grid output is modeled using a unit output function multiplied by its capacity. This approach decouples the homogeneous unit output function from dynamic capacity for accurate prediction. Efficient algorithms over an upper-bound approximation are proposed to overcome computational bottlenecks in loss functions. We demonstrate the superiority of grid-level modeling via theoretical analysis and experiments. SolarBoost has been validated through deployment across various cities in China, significantly reducing potential losses and provides valuable insights for the operation of power grids. The code for this work is available at https://github.com/DAMO-DI-ML/SolarBoost.